Skip to content

Conversation

@aheejin
Copy link
Member

@aheejin aheejin commented Jan 20, 2026

In ModuleSplitter::shareImportableItems, we now check if each module element is used by a secondary module and export & import it only when it is necessary. This removes the need to run RemoveUnusedModuleElements pass for secondary modules at the end, and also removes unnecessary exports from the primary module in case they are not used in any secondary modules.

This reduces the running time on a 'acx_gallery' reproducer provided by @biggs0125 before from 78.3s to 16.4s in average on my machine, reducing it by around 79%.

In that 'acx_gallery' program, this PR reduces the size of the primary module by 13% and its export section by 62.1%, and reduces the all combined module size (primary + secondaries) by 7.4%.


Detailed analysis for 'acx_gallery', a case where we split a module into 301 (1 primary + 300 secondary) modules:

  • Before this PR:
    Time: 78.3s
    Task breakdown:
Task                                          Total Time (ms)      Percentage
---------------------------------------------------------------------------
shareImportableItems                               34472.0000          49.90%
removeUnusedSecondaryElements                      24720.2000          35.78%
moveSecondaryFunctions                              4938.0500           7.15%
writeModule_secondary                               2892.7603           4.19%
writeModule_primary                                  897.3630           1.30%
exportImportCalledPrimaryFunctions                   667.9780           0.97%
indirectReferencesToSecondaryFunctions               201.9830           0.29%
indirectCallsToSecondaryFunctions                    133.2190           0.19%
classifyFunctions                                    118.3200           0.17%
setupTablePatching                                    42.6402           0.06%
thunkExportedSecondaryFunctions                        0.9307           0.00%
initExportedPrimaryFuncs                               0.8008           0.00%
---------------------------------------------------------------------------

  • After this PR:
    Time: 16.4s
    Task breakdown:
Task                                          Total Time (ms)      Percentage
---------------------------------------------------------------------------
moveSecondaryFunctions                              5341.1700          43.94%
writeModule_secondary                               2960.8319          24.36%
shareImportableItems                                1568.6400          12.91%
writeModule_primary                                  765.7240           6.30%
exportImportCalledPrimaryFunctions                   762.9670           6.28%
indirectReferencesToSecondaryFunctions               362.3300           2.98%
classifyFunctions                                    204.4540           1.68%
indirectCallsToSecondaryFunctions                    139.2110           1.15%
setupTablePatching                                    45.2461           0.37%
thunkExportedSecondaryFunctions                        3.0373           0.02%
initExportedPrimaryFuncs                               0.9982           0.01%
---------------------------------------------------------------------------
Overall Total                                      12154.6095         100.00%

We can see that shareImportableItems, which took up the largest share, has been reduced significantly, and removeUnusedSecondaryElements, which was the second largest, is not necessary anymore.


Code size

  • Primary module's export section size: -62.1%

  • With the name section present

    • Primary module size: -8%
    • All combined module size: -4.4%
  • With the name section stripped from the result

    • Primary module size: -13%
    • All combined module size: -7.4%

In `ModuleSplitter::shareImportableItems`, we now check if each module
element is used by a secondary module and export & import it only when
it is necessary. This removes the need to run RemoveUnusedModuleElements
pass for secondary modules at the end, and also removes unnecessary
exports from the primary module in case they are not used in any
secondary modules.

This reduces the running time on a 'acx_gallery' reproducer provided by
@biggs0125 before from 78.3s to 16.4s in average on my machine, reducing
it by around 79%.

In that 'acx_gallery' program, this PR reduces the size of the primary
module by 8% and reduces the all combined module size (primary +
secondaries) by 3.7%.

---

Detailed analysis for 'acx_gallery', a case where we split a module into
301 (1 primary + 300 secondary) modules:

- Before this PR:
Time: 78.3s
Task breakdown:
```
Task                                          Total Time (ms)      Percentage
---------------------------------------------------------------------------
shareImportableItems                               34472.0000          49.90%
removeUnusedSecondaryElements                      24720.2000          35.78%
moveSecondaryFunctions                              4938.0500           7.15%
writeModule_secondary                               2892.7603           4.19%
writeModule_primary                                  897.3630           1.30%
exportImportCalledPrimaryFunctions                   667.9780           0.97%
indirectReferencesToSecondaryFunctions               201.9830           0.29%
indirectCallsToSecondaryFunctions                    133.2190           0.19%
classifyFunctions                                    118.3200           0.17%
setupTablePatching                                    42.6402           0.06%
thunkExportedSecondaryFunctions                        0.9307           0.00%
initExportedPrimaryFuncs                               0.8008           0.00%
---------------------------------------------------------------------------
Overall Total
```

- After this PR:
Time: 16.4s
Task breakdown:
```
Task                                          Total Time (ms)      Percentage
---------------------------------------------------------------------------
moveSecondaryFunctions                              5341.1700          43.94%
writeModule_secondary                               2960.8319          24.36%
shareImportableItems                                1568.6400          12.91%
writeModule_primary                                  765.7240           6.30%
exportImportCalledPrimaryFunctions                   762.9670           6.28%
indirectReferencesToSecondaryFunctions               362.3300           2.98%
classifyFunctions                                    204.4540           1.68%
indirectCallsToSecondaryFunctions                    139.2110           1.15%
setupTablePatching                                    45.2461           0.37%
thunkExportedSecondaryFunctions                        3.0373           0.02%
initExportedPrimaryFuncs                               0.9982           0.01%
---------------------------------------------------------------------------
Overall Total                                      12154.6095         100.00%
```

We can see that `shareImportableItems`, which took up the largest share,
has been reduced significantly, and `removeUnusedSecondaryElements`,
which was the second largest, is not necessary anymore.
@aheejin aheejin requested a review from tlively January 20, 2026 17:03

#define DELEGATE_FIELD_NAME_KIND(id, field, kind) \
if (cast->field.is()) { \
if (kind == ModuleItemKind::Global) { \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A switch here might be better than an if-else chain. If nothing else, it would give us a compiler error if we ever add a new ModuleItemKind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 8642d46

Comment on lines 975 to 997
NameCollector collector(used);
for (auto& global : secondary.globals) {
if (!global->imported()) {
collector.walk(global->init);
}
}
for (auto& segment : secondary.dataSegments) {
used.memories.insert(segment->memory);
if (segment->offset) {
collector.walk(segment->offset);
}
}
for (auto& segment : secondary.elementSegments) {
if (segment->table.is()) {
used.tables.insert(segment->table);
}
if (segment->offset) {
collector.walk(segment->offset);
}
for (auto* item : segment->data) {
collector.walk(item);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this except checking for tables and memories in active segments could be replaced with a call to collector.walkModuleCode().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 8b7850a
And added a missing check e859242


for (auto& table : primary.tables) {
auto secondaryTable = secondary.getTableOrNull(table->name);
if (!secondaryTable && !used.tables.count(table->name)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this condition different from the others? Maybe worth a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 5e7b376

Comment on lines +1177 to +1199
(type $0 (func))
(import "primary" "%a" (func $0 (exact (type $0))))
(import "primary" "%c" (func $1 (exact (type $0))))
(import "primary" "%d" (func $2 (exact (type $0))))
(import "primary" "already_exported" (func $3 (exact (type $0))))
(import "primary" "%e" (func $4 (exact (type $0))))
(import "primary" "%f" (func $5 (exact (type $0))))
(import "primary" "%g" (func $6 (exact (type $0))))
(import "primary" "%b" (func $7 (exact (type $0))))
(import "primary" "%h" (func $8 (exact (type $0))))
(import "primary" "%i" (func $9 (exact (type $0))))
(func $call (type $0)
(call $0)
(call $1)
(call $2)
(call $3)
(call $4)
(call $5)
(call $6)
(call $7)
(call $8)
(call $9)
)
Copy link
Member Author

@aheejin aheejin Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few test cases in this file that have new functions in secondary modules. This is because we don't run RemoveUnusedModuleElements on secondaries anymore. In practice this wouldn't be a problem because if a function is not used anywhere it would have been optimized away before splitting.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@aheejin
Copy link
Member Author

aheejin commented Jan 26, 2026

Sorry, I pushed code for time measuring by mistake. Reverted.

@aheejin
Copy link
Member Author

aheejin commented Jan 27, 2026

Updated more code size stats in the PR description.
(I also restored some incorrect time measurements: it's ~80% reduction, not 90%)

@aheejin aheejin merged commit d9e6ed3 into WebAssembly:main Jan 27, 2026
17 checks passed
@aheejin aheejin deleted the wasm_split_share_needed branch January 27, 2026 09:46
aheejin added a commit to aheejin/binaryen that referenced this pull request Jan 28, 2026
`moveSecondaryFunctions`, despite the name, actually copies functions to
secondary modules and removes them from the primary module. This
prevents this overhead by actually moving the functions to secondary
modules. This reduces the running time on acx_gallery (provided by
@biggs0125) by 31%.

---

- Before this PR:
  Time: 16.4s
  Task breakdown:
```
Task                                          Total Time (ms)      Percentage
---------------------------------------------------------------------------
moveSecondaryFunctions                              5341.1700          43.94%
writeModule_secondary                               2960.8319          24.36%
shareImportableItems                                1568.6400          12.91%
writeModule_primary                                  765.7240           6.30%
exportImportCalledPrimaryFunctions                   762.9670           6.28%
indirectReferencesToSecondaryFunctions               362.3300           2.98%
classifyFunctions                                    204.4540           1.68%
indirectCallsToSecondaryFunctions                    139.2110           1.15%
setupTablePatching                                    45.2461           0.37%
thunkExportedSecondaryFunctions                        3.0373           0.02%
initExportedPrimaryFuncs                               0.9982           0.01%
---------------------------------------------------------------------------
Overall Total                                      12154.6095         100.00%
```

- After this PR:
  Time: 11.3s
```
Task                                          Total Time (ms)      Percentage
---------------------------------------------------------------------------
writeModule_secondary                               2908.4728          43.52%
shareImportableItems                                1562.7800          23.39%
exportImportCalledPrimaryFunctions                   786.6140          11.77%
writeModule_primary                                  751.3420          11.24%
indirectReferencesToSecondaryFunctions               250.7260           3.75%
indirectCallsToSecondaryFunctions                    149.7940           2.24%
classifyFunctions                                    126.4520           1.89%
moveSecondaryFunctions                               106.0680           1.59%
setupTablePatching                                    38.9292           0.58%
initExportedPrimaryFuncs                               0.8816           0.01%
thunkExportedSecondaryFunctions                        0.4937           0.01%
---------------------------------------------------------------------------
Overall Total                                       6682.5533         100.00%
```

We can see the time spent in `moveSecondaryFunctions`, which used to
take nearly half of the running time (after WebAssembly#8221), is now negligible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants